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Abstract 

We propose the integration of a relational specification framework 
within a dependent type system capable of verifying complex in- 
variants over the shapes of algebraic datatypes. Our approach is 
based on the observation that structural properties of such datatypes 
can often be naturally expressed as inductively-defined relations 
over the recursive structure evident in their definitions. By inter- 
preting constructor applications (abstractly) in a relational domain, 
we can define expressive relational abstractions for a variety of 
complex data structures, whose structural and shape invariants can 
be automatically verified. Our specification language also allows 
for definitions of parametric relations for polymorphic data types 
that enable highly composable specifications and naturally gener- 
alizes to higher-order polymorphic functions. 

We describe an algorithm that translates relational specifications 
into a decidable fragment of first-order logic that can be efficiently 
discharged by an SMT solver. We have implemented these ideas 
in a type checker called CATALYST that is incorporated within 
the MLton SML compiler. Experimental results and case studies 
indicate that our verification strategy is both practical and effective. 

Categories and Subject Descriptors D.3.2 [Language Classifi- 
cations]: Applicative (Functional) Languages; F.3.1 [Logics and 
Meanings of Programs]: Specifying and Verifying and Reason- 
ing about Programs; D.2.4 [Software Engineering] : Software/Pro- 
gram Verification 

Keywords Relational Specifications; Inductive Relations; Para- 
metric Relations; Dependent Types; Decidability; Standard ML 

1. Introduction 

Dependent types are well-studied vehicles capable of expressing 
rich program invariants. A prototypical example is the type of 
a list that is indexed by a natural number denoting its length. 
Length-indexed lists can be written in several mainstream lan- 
guages that support some form of dependent typing, including 
GHC Haskell F* t2"Tll23l . and OCaml 1 16|. For example, the 
following Haskell signatures specify how the length of the result 
list for append and rev relate to their arguments: 

append : : List a n -> List a m -> List a (Plus n m) 
rev : : List a n -> List a n 

While length-indexed lists capture stronger invariants over append , 
and rev than possible with just simple types, they still under- 
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specify the intended behavior of these operations. For example, a 
correctly written append function must additionally preserve the 
order of its input lists; a function that incorrectly produces an out- 
put list that is a permutation of its inputs would nonetheless satisfy 
append 's type as written above. Similarly, the identity function 
would clearly satisfy the type given for rev ; a type that fully cap- 
tures rev 's behavior would also have to specify that the order of 
elements in rev 's output list is the inverse of the order of its input. 
Is it possible to ascribe such expressive types to capture these kinds 
of important shape properties, which can nonetheless be easily 
stated, and efficiently checked? 

One approach is to directly state desired behavior in type refine- 
ments, as in the following signature: 

rev : {1 : 'a list} — ► \y: 'a list I v = rev'(l)} 

Here, rev' represents some reference implementation of rev. 
Checking rev 's implementation against this refinement is tanta- 
mount to proving the equivalence of rev and rev' . Given the 
undecidability of the general problem, expecting these types to be 
machine checkable would require the definition of rev ' to closely 
resemble rev 's. For all but the most trivial of definitions, this ap- 
proach is unlikely to be fruitful. An alternative approach is to de- 
fine rev within a theorem prover, and directly assert and prove 
properties on it - for example, that rev is involutive. Although 
modern theorem provers support rich theories over datatypes like 
lists, this strategy nonetheless requires that the program be fully 
described in logic, and reasoned about by the solver in its entirety. 
Thus, defining rev in this way also requires an equational defi- 
nition of append , assuming the former is defined in terms of the 
latter. For non-trivial programs, this may require equipping provers 
with arbitrarily complex theories, whose combination may not be 
decidable. Such a methodology also does not obviously address our 
original goal of specifying rev 's functional correctness, indepen- 
dent of its definition; note that in the case of rev , involution does 
not imply functional correctness. Clearly, the challenges in building 
suitably typed definitions that let us reason about interesting shape 
properties of a data structure are substantial. 

Nonetheless, the way the length of a list is tracked using its 
length-indexed type offers a useful hint about how we can reason 
about its shape. Akin to the Nat domain that indexes a list type 
with a length abstraction, we need an appropriate abstract domain 
that we can use to help us reason about a list's shape properties. 
For instance, in the case of list reversal, the abstract domain should 
allow us to structurally reason about the order of elements in the 
input and output lists. A useful interpretation of a list order that 
satisfies this requirement would be one that relates every element 
in a list with every another element based on an ordering predicate 
(e.g., occurs-before or occurs-after). By defining an exhaustive 
enumeration of the set of all such pairs under this ordering, we 
can effectively specify the total order of all elements in the list. 
More precisely, observe that the notion of order can be broken down 
to the level of a binary relation over elements in the list, with the 
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transitive closure of such a relation effectively serving as a faithful 
representation. 

For example, consider a relation R 0 b that relates a list to a pair if 
the first element in the pair occurs before the second in the list. For 
a concrete list I=[xl,x2,x3] , the relation's closure R* ob would 
be: 

{( 1 , <xl, x2» , ( 1 , (xl, x3» , ( 1 , <x2, x3»}[] 

Conversely, an occurs-after (R oa ) relation serves as the semantic 
inverse of occurs-before; given these two relations, we can specify 
the following type for rev : 

rev : { 1 : 'a list} — -> {v. >a list I R* ob (l ) = R* oa (v)} 

Since R* ob ( 1 ) represents the set of pairs whose elements exhibit 
the occurs-before property in the input list, and R* a (y) represents 
the set of pairs whose elements exhibit the occurs-after property 
in the output list, the above specification effectively asserts that for 
every pair of elements x and y in the input list 1 , if x occurs before 
y in 1 , then x has to occur after y in the result list v. 

This property succinctly captures the fact that the result list is 
the same as the original list in reverse order without appealing to the 
operational definition of how the result list is constructed from the 
input. By using a relational domain to reason about the shape of the 
list, we avoid having to construct a statically checkable reference 
implementation of rev . 

We refer to operators like R ob and R oa as structural rela- 
tions because they explicitly describe structural properties of a data 
structure. Such relations can be used as appropriate abstract do- 
mains to reason about the shapes of structures generated by con- 
structor applications in algebraic data types. Given that relations 
naturally translate to sets of tuples, standard set operations such 
as union and cross-product are typically sufficient to build useful 
relational abstractions from any concrete domain. This simplicity 
makes relational specifications highly amenable for automatic ver- 
ification. 

The type of rev given above captures its functional behavior 
by referring to the order of elements in its argument and result lists. 
However, the notion of order as a relation between elements of the 
list is not always sufficient. For example, consider the function, 

dup : 'a list — > ('a* 'a) list 

that duplicates the elements in its input list. An invariant that we 
can expect of any correct implementation is that the order of left 
components of pairs in the output list is the same as the order of its 
right components, and both are equal to the order of elements in the 
input list. Clearly, our definitions of R B b and R oa as relations over 
elements in a list are insufficient to express the order of individual 
components of pairs in a list of pairs. How do we construct general 
definitions that let us capture ordering invariants over different 
kinds of lists without generating distinct relations for each kind? 

We address this issue by allowing structural relations defined 
over a polymorphic data type to be parameterized by relations 
over type variables in the data type. For instance, the R 0 b relation 
defined over a 'a list can be parameterized by a polymorphic 
relation R over ' a . Instead of directly relating the order of two 
elements x and y in a polymorphic list, a parametric occurs-before 
relation generically relates the ordering of R(x) and R(y); R's 
specific instantiation would draw from the set of relations defined 
over the data type that instantiates the type variable ( 'a). In the 



1 Given a relation R = {(z, y%) , (a;, yz) , • • ■ , {x, y n )} where a: is an 
instance of some datatype, and the yi are tuples that capture some shape 
property of interest, we write R(x) as shorthand for {yi,y2, ■ ■ ■ ,Vn}- 
Thus, 

Kb( 1 ) = {(xl,x2),(xl,x3>,(x2,x3)} 



case of dup, R a b could be instantiated with relations like Rf s t 
and R sn d that project the first and second elements of the pairs in 
dup 's output list. The ability to parameterize relations in this way 
allows structural relations to be used seamlessly with higher-order 
polymorphic functions, and enables composable specifications over 
defined relations. 

In this paper, we present an automated verification framework 
integrated within a refinement type system to express and check 
specifications of the kind given above. We describe a specification 
language based on relational algebra to define and compose struc- 
tural relations for any algebraic data type. These definitions are 
only as complex as the data type definition itself in the sense that it 
is possible to construct equivalent relational definitions directly su- 
perimposed on the data type. Relations thus defined, including their 
automatically generated inductive variants, can be used to specify 
shape invariants and other relational properties. Our typechecking 
procedure verifies specifications by interpreting constructor appli- 
cations as set operations within these abstract relational domains. 
Typechecking in our system is decidable, a result which follows 
from the completeness of encoding our specification language in a 
decidable logic. 

The paper makes the following contributions: 

1. We present a rich specification language for expressing refine- 
ments that are given in terms of relational expressions and fa- 
miliar relational algebraic operations. The language is equipped 
with pattern-matching operations over constructors of algebraic 
data types, thus allowing the definition of useful shape proper- 
ties in terms of relational constraints. 

2. To allow relational refinements to express shape properties over 
complex data structures, and to be effective in defining such 
properties on higher-order programs, we allow the inductive re- 
lations found in type refinements to be parameterized over other 
inductively defined relations. While the semantics of a relation- 
ally parametric specification can be understood intuitively in 
second-order logic, we show that it can be equivalently encoded 
in a decidable fragment of first-order logic, leading to a practi- 
cal and efficient type-checking algorithm. 

3. We present a formalization of our ideas, including a static 
semantics, meta-theory that establishes the soundness of well- 
typed programs, a translation mechanism that maps well-typed 
relational expressions and refinements to a decidable many- 
sorted first-order logic, and a decidability result that justifies 
the translation scheme. 

4. We describe an implementation of these ideas in a type checker 
called CATALYST that is incorporated within the MLton Stan- 
dard ML compiler, and demonstrate the utility of these ideas 
through a series of examples, including a detailed case study 
that automatically verifies the correctness of a-conversion and 
capture-avoiding substitution operations of the untyped lambda 
calculus, whose types are expressed using relational expres- 
sions. 

The remainder of the paper is structured as follows. In the next 
section, we present additional motivation and examples for our 
ideas. Sec.[3]formalizes the syntax and static semantics of relational 
refinements in the context of a simply-typed core language. Sec. [4] 
extends the formalization to support parametric refinements within 
a polymorphic core language. Our formalization also presents a 
translation scheme from relational refinements to a decidable first- 
order logic. Details about the implementation are given in Sec. [5] 
Sec.[6]presents a case study. Sees. [7] [8] and [9] pre sent related work, 
directions for future work, and conclusions, respectively. 
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2. Structural Relations 

Our specification language is primarily the language of relational 
expressions composed using familiar relational algebraic operators. 
This language is additionally equipped with pattern matching over 
constructors of algebraic types to define shape properties in terms 
of these expressions. A number of built-in polymorphic relations 
are provided, the most important of which are listed below: 

R td (x) = {(*)} 

Rdup M = {{*, *}} 

Rno tEqk (x) = {(x)}-{(k)} 

R eqk (x) = {(x)}-({(x)}-{(k>}) 

Rid is the identity relation, Rd up is a relation that associates a 
value with a pair that duplicates that value, R no tEq k is a relation 
indexed by a constant k (of some base type) that relates x to itself, 
provided x is not equal to k , and Req k is defined similarly, except 
it relates x to itself exactly when x is equal to k . Apart from the 
relations defined above, the language also includes the primitive 
relation 0 that denotes the empty set. 

To see how new structural relations can be built using relational 
operators, primitive relations, and pattern-match syntax, consider 
the specification of the list-head relation that relates a list to its 
head element: 

relation Rhd (x::xs) = {(x)} 

\R hd □ = 0 

For a concrete list 1 , Rhd ( 1 ) produces the set of unary tuples 
whose elements are in the head relation with 1 . This set is clearly 
a singleton when the list is non-empty and empty otherwise. The 
above definition states that for any list pattern constructed using 
whose head is represented by pattern variable x and whose tail is 
represented by pattern variable xs,(l) (x :: xs,x) 6 Rhd, and (2) 
there does not exist an x' such that x' 7^ x and (x::xs,x'} G 
Rhd ■ The declarative syntax of the kind shown above is the primary 
means of defining structural relations in our system. 

2.1 Relational Composition 

Simple structural relations such as Rh d have fixed cardinality , i.e., 
they have a fixed number of tuples regardless of the concrete size 
of the data structure on which they are defined. However, practical 
verification problems require relations over algebraic datatypes to 
have cardinality comparable to the size of the data structure, which 
may be recursive. 

For example, the problem of verifying that an implementation 
of rev reverses the ordering of its input requires specifying a 
membership relation (Rmem) that relates a list 1 to every element 
in 1 (regardless of 1 's size). This relation would allow us to 
define an ordering property such as occurs-before or occurs-after 
on precisely those elements that comprise rev 's input and output 
lists. A recursive definition of R me m looks likeQ 

Rmem (x V. XS ) = {(*)} U Rmem ( XS ) 

We can equivalently express Rmem as an inductive extension of the 
head relation Rhd defined above. Suppose R is a structural relation 
that relates a list I of type 'a list with elements v of type 'a. 
Then, the inductive extension of 7? (written R*) is the least relation 
that satisfies the following conditions: 

• if (l,v) G R, then (l,v) G R* 

• if I = x :: xs and (xs, v) G R then (l, v) G R* 



2 In some our examples, we elide the case for the empty list, which defaults 
to the empty set. 



Thus, Rman = Rhd- We can think of the induction operator as a 
controlled abstraction for structural recursion. Based on the recur- 
sive structure of an algebraic data type, sophisticated inductive def- 
initions can be generated from simple structural relations defined 
for that data type. 

Equipped with Rmem, we can now precisely define the occurs- 
before relation defined earlier. Because R ob relates a list to a pair 
whose first element is the head of the list, and whose second 
element is a member of its tail, it can be expressed in terms of 

R mem &US: 

relation R ob (x :: xs) = {(x)} x Rmem (xs) 

The transitive closure of this relation R* b expresses the occurs- 
before property on every element in the list. The occurs-after rela- 
tion can be defined similarly: 

relation R oa (x :: xs) = R me m (xs) x {(x)} 

2.2 Parametric Relations 

Consider how we might specify a zip function over lists, with the 
following type: 

zip : 'a list — > 'b list — ► ( 'a * 'b) list 

Any correct implementation of zip must guarantee that the ele- 
ments of the output list are pairs of elements drawn from both ar- 
gument lists. The Rmem relation defined above provides much of 
the functionality we require to specify this invariant; intuitively, the 
specification should indicate that the first (resp. second) element of 
every pair in the output list is in a membership relation with zip 's 
first (resp. second) argument. Unfortunately, as currently defined, 
Rmem operates directly on the pair elements of the output, not the 
pair's individual components. What we require is a mechanism that 
allows Rmem to assert the membership property on the pair's com- 
ponents (rather than the pair directly). 

To do this, we allow structural relations to be parameterized 
over other relations. In the case of zip , the parameterized member- 
ship relation can be instantiated with the appropriate relationally- 
defined projections on a pair type. Concretely, given new param- 
eterized definitions of Rhd and Rmem, and related auxiliary rela- 
tions: 

relation (Rhd R) (x: :xs) = R (x) 

I (RhdR) □ = 0 

relation (Rmem R) = (Rhd R)* 
relation Rf st (x,y) ={(x)} 
relation R snd (x,y)={(y)} 

zip can now be assigned the following type that faithfully captures 
the membership relation between its input lists and its outputrl 

zip : li — > 1 2 — > 

{HP Rfst) v) = ((R ■mem Rid) li ) 

■mem Rsnd) V) = ((R 

■mtrri) Rid) i 2 )} 
Similarly, we can define parametric versions of R 0 b and R oa : 

relation (Rob R) (x:xs) = R (x) x ((Rmem R) xs) 
relation (R oa R) (x:xs) = ((Rmem R) xs) X R (x) 

Using this parametric version of Rob, the dup function described 
in the previous section can now be specified thus: 

dup : 1 -> { v I {{Rob Rfst)* v) = {{R ob Rid)* 1) 
A {(R ob R snd y v) = ((Rot RuT 1) } 

3 We drop ML types from dependent type specifications when obvious from 
context. 
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2.3 Parametric Dependent Types 

Our specification language also allows dependent types to be pa- 
rameterized over relations used in type refinements. In the spirit of 
type variables, we use relation variables to denote parameterized 
relations in a type. To illustrate why such parameterization is use- 
ful, consider the following signature for f oldl : 

CR bm ) foldl : 

{1 : 'a list} -> {b : 'b} 

({f : {x : 'a} -> {acc : 'b} -> 

{z : >b I 'R bm U) = {(x)} U 'i?,, m (acc) }}) - 
{ V I 'Rbm (v) = R (1) U'i? bm (b)} 

This type relates membership properties on foldl 's input list, ex- 
pressed in terms of a non-parametric Rmem relation, to an abstract 
notion of membership over its result type ( ' b ) captured using a re- 
lation variable ('Rbm)- This signature constrains foldl to produce 
a result for which a membership property is a sensible notion. For 
instance, if foldl were applied to arguments in which b was of 
some list type (e.g., [] ) because it is used as a list transform opera- 
tor, then 'Rbm, could be trivially instantiated with Rmem- However, 
allowing types to be parameterized over relation variables enable 
richer properties to be expressed. For example, consider the func- 
tion makeTree that uses foldl to generate a binary tree using 
function treelnsert (not shown): 

datatype 'a tree = Leaf 

I Tree of 'a * ('a Tree) * ('a Tree) 

relation Rthd Leaf =0 

| R thd (Tree (x , ti , t 2 )) = {( x )} 
relation Rtmem = R*hd 

makeTree : {1 : 'a list} — >• 

{V : 'a tree Rtmem{v) = Rmem( 1 )} 

val makeTree = f n 1 => 

foldl iRtmem) 1 Leaf treelnsert 

Function makeTree uses foldl by first instantiating the relation 
variable 'Rbm in the type of foldl to Rtmem- The resultant type 
of foldl requires its higher-order argument to construct a tree 
using members of its tree argument (acc), and the list element 
( x ) to which it is applied. In return, foldl guarantees to produce 
a tree, which contains all the members of its list argument. It 
should be noted that a correct implementation of treelnsert will 
have the required type of foldl 's higher-order argument, after 
instantiating 'Rbm to Rtmem- Thus, the application of foldl in 
the above example typechecks, producing the required invariant of 
makeTree . 

Foldl 's type can also be parameterized over an abstract notion 
of membership for type variable ' a , captured by another relation 
variable ('Ram) to state a more general membership invariant. Con- 
cretely, this requires that the tuple ({( x )}) in the type refinement 
of higher-order argument ( f ) be replaced with ' R am ( x )), and the 
non-parametric Rmem relation in the result type refinement be sub- 
stituted with a parametric {Rmem 'Ram) relation. In cases when 
there does not exist any useful notion of membership for types that 
instantiate 'a and 'b, relation variables 'R am and Rbm can be 
instantiated with 0 to yield tautological type refinements. 

An alternative type for foldl could relate the order of elements 
in the argument list to some order of the result. The intuition is 
as follows: suppose the result type ( 'b) has some notion of order 
captured by a relation such that the result of foldl 's higher-order 
argument ( f ) has a refinement given in terms of this relation; i.e., it 
says something about how the order relation of its result ( z ) relates 
to its arguments (x and acc). But, x comes from the list being 
folded, and f is applied over elements of this list in a pre-defined 
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order. Therefore, we can express invariants that relate the order of 
the input list to the order of the result type, given that we know the 
order in which f is applied over the list. The type of foldl that 
tries to match the abstract order ('Rbo) on the result type ( 'b ) to an 
occurs-after order on the input list is shown below. For brevity, we 
avoid reproducing membership invariants from the type of foldl 
from the previous example, using ellipses in their place: 

('Rbm,'Rbo) foldl :{1 : 'a list} -> {b : 'b} -> 
({f : {x : 'a} -> { acc : 'b} -> 

{z | 'i? to (z) = ({(x)} X 'i?(, m (acc))U 
'i?6o( acc ) A ...}) -» 

{u\ 'Rbo(v)=Roa (l)U'Rb„(b))U 

((R infill (1)) x 'R bm (b))A ...} 
An implementation of rev that uses foldl is given below: 

rev : {1 : 'a list} ->• {v : 'a list I R* ob (v) = R* oa (l)} 

val Cons = fn x => fn xs => x: :xs 

val rev = fn 1 => foldl (R m em,R* 0 b) 1 [] Cons 

Our type checker successfully typechecks the above program, given 
the standard definition of foldl . Note that, due to the difference 
in the order in which the higher-order argument is applied over 
the input list, the type of f oldr will be necessarily different from 
foldl . Consequently, using foldr instead of foldl in the above 
program fails type checking, as would be expected. 

3. Core language 

3.1 Syntax 

We formalize our ideas using a core calculus (A^) shown in Fig.[T| 
an A-normalized extension of the simply-typed lambda calculus. 
The language supports a primitive type (int), a recursive data type 
(intlist), along with dependent base and function types. Because 
the mechanisms and syntax to define and elaborate recursive data 
types are kept separate from the core, \r is only provided with two 
constructors, Nil and Cons used to build lists. The language has a 
standard call-by-value operational semantics, details of which can 
be found in an accompanying technical report 1101 j^] 

Dependent type refinements (<j>) in Xr are assertions over re- 
lational expressions (r); these expressions, which are themselves 

4 Proofs for all lemmas and theorems given in this paper are also provided 
in the report. 
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typed, constitute the syntactic class of expressions in our specifi- 
cation language. We refer to the types of relational expressions as 
sorts, in order to distinguish them from A# types. We write r :: s 
to denote that a relational expression r has sort s. A structural re- 
lation is a triple, consisting of a unique relation name, its sort, and 
its definition as (a) a pattern-match sequence that relates construc- 
tors of an algebraic data type to a relation expression, or (b) an 
inductive extension of an existing relation, captured using the clo- 
sure operator (*). We write R = 5 to denote that a relation R has a 
(pattern-match or inductive) definition 8. 

A structural relation maps a value to a set of tuples (8). We 
use ":— to distinguish such maps from the mapping expressed by 
dependent function types. For example, the notation: 

Rob :: intlist :— > {int * int} 

indicates that the sort of relation R 0 b is a map from integer lists 
to pairs. As reflected by the syntactic class of relation sorts (tr), 
the domain of a Xr relation is either intlist or int. For the purposes 
of the formalization, we assume the existence of a single primitive 



relation R ic i whose sort is int :— > {int} that defines an identity 
relation on integers. 

3.2 Sorts, Types and Well-formedness 

Fig. [3] defines rules to check sorts of structural relations and re- 
lational expressions, establish well-formedness conditions of type 
refinements, and type-check expressions. The judgments defined by 
these rules make use of environment T, defined as follows: 

T ::= • | T, a; : r | T, <j> 

Environments are ordered sets of assertions that make up a typing 
context. Assertions are either (a) type bindings for variables, or 
(b) type refinements that reflect branch conditions collected from 
match expressions. We assume that any variable is bound only 
once in T. 

Structural relations are sort checked under an empty type envi- 
ronment. The rule S-REL type checks a relation definition by en- 
suring that relational expressions associated with the constructors 
that comprise the definition all have the same sort. The rule S-REL- 
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[i?(Consi;i v 2 ) 

{R{m)\ 

in 

1{T}\ 
[{T * 0}J 
IT :-> {0}] 
[«id] 



[Sfl(ii) (Cons-ui v 2 )} 

p3 K (JI)(NII)] 

.F(T) 

[T] -> 6ooZ 

m -> [{e}] 

V(i:[int]). 

V(fc:[int]).j = fc 



[rx U r2] 
[n x r 2 ] 

7u (V(fc : T F ).ei,0,V(fc : T F ).e 2 ) 

7u(0 f,0,^ ) 

7 N(Vi:T/.^f,0,Vfe:Tf.^f) 



w P (R, [r fl (iJ)]) 

7nst([R], x) 
7u([n],V,[ra]) 

T*(In],A,M) 

V(fc:T F ). 7u (ei,0,e 2 ) 



6f 00f 



V(i:Tf).V(fc:Tf).<0^ 



Semantics of Type Refinements 

[01 A 021 = [0l] A [0 2 ] 
[0! V 0 2 ] = [0i] V [0 2 ] 



[n = r 2 \ 
In C r 2 ] 



7u([n],^,[r 2 ]) 
7u(M,=^,[r 2 ]) 



Figure 4: Semantics of Specification Language 



STAR captures the fact that an inductive extension of a relation has 
the same type as the relation itself. The rule S-APP sort checks 
relation applications by ensuring that the argument to the relation 
has the required simple (non-dependent) type. The rule makes use 
of a simple typing judgment (lh) under a refinement erased F (de- 
noted ||r||) for this purpose. Rules for simple typing judgments are 
straightforward, and are elided here; the full set of rules can be 
found in the accompanying technical report 1 10 1. 

Refinement erasure on a dependent base type (r) sets its type 
refinement to true, effectively erasing the refinement to yield a 
simple type. For function types, erasure is defined recursively: 

||{v:T[0}|| = T [|(»:n)-+ia|| = ||n|| ->■ 

Refinement erasure for type environments performs erasure over 
all type bindings within the environment, in addition to erasing all 
recorded branch conditions. For an empty environment, refinement 
erasure is an identity. 

||r, a: : t[| = ||r[| , x : ||r[| 0|| = [|r|| 

The dependent type checking rules for Xr expressions are 
mostly standard, except for T-CONST and T-MATCH. The rule 
T-CONST makes use of a function ty that maps a constant c to a 
type (ty(c)), which remains its type under any F. The function ty 
is defined below: 

Vi e Z, ty(i) = int 

ty(N\\) = {u : intlist | 0„} 

tj/(Cons) = x : int — > y : intlist — > {v : intlist | 0 C } 

The type refinements of Nil ($>„) and Cons (<f) c ) in the T-MATCH 
rule are conjunctive aggregations of Nil and Cons cases (resp.) 
of all structural relation definitions found within a program. To 
help us precisely define cj>„ and cf> c , we assume the presence of 
(a) a globally-defined finite map (Ejj) that maps relation names 
to their pattern-match definitions, and (b) a finite ordered map Fr 
that maps relation names to their sorts. We implicitly parameterize 
our typing judgment over E_r (i.e., our h is actually \~ (s R ,r R ))- 
Inductive relations defined using the closure operator are assumed 



to be unfolded to pattern-match definitions before being bound in 

E fl : 

R = Ri, T, R {R 2 ) = (Nil n, Consist r 2 ) 
S fl (R) = (Nil => n, Cons x y =>• r 2 U R(y)) 

For the sake of presentation, we treat the pattern-match defini- 
tion of a structural relation as a map from constructor patterns 
to relational expressions. Consequently, when Y<r(R) = (Nil => 
n, Cons 2; y =>■ r 2 U R(y)), the notation En(_R)(Nil) denotes n, 
and Y>r(R) (Cons x y) denotes r 2 . With help of En, we now define 
4> n , and 4> c as: 

^ = A fledo m(S fl ) #M = £fl(fl)(Nil) 

<t>c = Afl 6dom (E R ) R{ v ) = Z R (R)(Consxy) 

For instance, consider a case where Ej? has only one element (R) 
in its domain: 

E R = [R^ (Nil =► Ri d (0) i Conszy R ld (x)}] 

The type of Nil and Cons in such case is as following: 

ty(N\\) = {u: intlist | R(u) = R id {0)} 

ty (Cons) = x : int — > y : intlist — > \y : intlist | R(y) = Ri d (x)} 

The T-MATCH rule type checks each branch of the match expres- 
sion under an environment that records the corresponding branch 
condition. Additionally, the type environment for the Cons branch 
is also extended with the types of matched pattern variables (x and 
y). The branch condition for the Cons (alternatively, Nil) case is ob- 
tained by substituting the test value (v) for the bound variable (v) 
in the type refinement of Cons (Nil). Intuitively, the branch condi- 
tion of Cons (alternatively, Nil) captures the fact that the value v 
was obtained by applying the constructor Cons (Nil); therefore, it 
should satisfy the invariant of Cons (Nil). For instance, consider the 
match expression: 

match 2 with Cons x y ej else e 2 
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where Cons has typqj 

Cons : x:\nt — > yrintlist — > 

{v : intlist | R mem (u) = R ld (x) U R mem (y)} 

Expression e\ is type-checked under the extended environment: 

r, a;:{i^:int j true}, :rs:{i/:intlist | true}, 
R O) = Ri d (x) U R Hi) 

The subtyping rules allow us to propagate dependent type in- 
formation, and relate the subtype judgment to a notion of semantic 
entailment ( (=) in logic. The cornerstone of subtyping is the sub- 
typing judgment between base dependent types defined by the rule 
Subt-Base. The rule refers to the map Tr that provides sorts for 
relations occurring free in type refinements. Intuitively, the rule as- 
serts dependent type t\ to be a subtype of t%, if and only if: 

• Their base types match, and, 

• Given a logical system L, and interpretations of type environ- 
ment (T, v : T) and the type refinement <j>i (of n) in L, the 
following implication holds in L: 

The context under which the implication has to be valid ([rjj], 
is the interpretation of sort bindings of relations in L. 

The soundness of A_r's type system is defined with respect to a 
reduction relation ( — >) that specifies the langauge's operational 
semantics: 



Theorem 3.1. (Type Safety) if ■ h 

value, or there exists an e' such that e — 



: t, then either e is a 
■ e and -he : r. 



3.3 Semantics of the Specification Language 

The semantics of our specification language is defined via a trans- 
lation from well-typed relational expressions and well-formed type 
refinements to quantified propositions of many-sorted first-order 
logic (MSFOL). 

Many-sorted first-order logic extends first-order logic (FOL) 
with sorts (types) for variables. For our purpose, we only consider 
the extension with Booleans and uninterpreted sorts, i.e., sorts that, 
unlike int , do not have an attached interpretation. Ground terms, 
or quantifier-free formulas, of MSFOL are drawn from proposi- 
tional logic with equality and n-ary uninterpreted functions. 

Our MSFOL semantics make use of the Ej? map defined previ- 
ously. For perspicuity, we introduce the following syntactic sugar: 

E fl (7?)(Conswi v 2 ) = [v2/y] [vi/x] Y, R (R)(Consx y) 

Further, we also assume a finite ordered map Fr that maps struc- 
tural relations to their sorts. That is, for all R such that hi?:: 
tr, we have that Tr(R) = tr. 

Fig. [4] describes the MSFOL semantics of A^'s specification 
language. The semantics is operational in the sense that it describes 
an algorithm to compile assertions in A.r type refinements to for- 
mulas in MSFOL. Our semantics are parameterized over an auxil- 
iary function {F) that maps \r datatypes to uninterpreted sorts in 
MSFOL. The specific uninterpreted sorts types map to are not rele- 
vant here. However, T has to be a total function over \r datatypes. 
Note that despite treating interpreted types (eg: intlist and int) as 
uninterpreted sorts in the underlying logic, the exercise of ascribing 
a semantics to the type refinement language is complete. This is be- 
cause the interpretation of any type is the collection of operations 
allowed on that type, and our type refinement language does not 
contain operations that are specific to values of any specific type. 

5 In our examples, we assign the same names to formal and actual arguments 
for convenience. 



Relations translate to uninterpreted functions with a Boolean 
co-domain in MSFOL. We choose to curry sorts of uninterpreted 
functions representing relations (R) to simplify the semantics. The 
auxiliary function r\ wrav wraps an uninterpreted function under 
a quantified formula; this can be construed as an eta-equivalent 
abstraction of an uninterpreted function in prenex quantified logic. 
As an example, suppose we have 

R :: intlist {int} 

{int}. Assume that: [[int] = A 0 



That is, Tr maps R to intlist 
and [intlist] = A\. Now, 



\R\ = Vv>rap(R,rR(R)) 

riwra P (R, [intlist {int}]) 
(R, [intlist] -> [{int}]) 

r)-wrap(R, A\ — > A 0 — > bool) 
V(fc : Ai).T] wr ap(Rk,Ao -> bool) 
V(fc : Ai).V(J : A 0 ).ri wrap (Rkj, bool) 
V(fc : Ai).V(j : A 0 ).Rkj 

Auxiliary function Inst instantiates a prenex-quantified formula. 
We employ the standard interpretation of set union and cross prod- 
uct operations, when sets are represented using prenex-quantified 
propositions: 

Va£.0i U \/x.<t)2 = Vz.(0i V <t> 2 ) 
Mx.(j>i x = Vx.Vy.(4>i A $2) 

Our semantics use syntactic rewrite functions - 7u and 7m, to per- 
form this translation, and to move quantification to prenex position 
when composing quantified formulas using logical connectives. 

To demonstrate the compilation process, we consider the fol- 
lowing Xr assertion: 

(xs) 

involving membership and occurs-before relations for integer lists: 

Rmem, intlist :— > {int} 
Rob :: intlist :— s> {int*int} 

The series of steps that compile the assertion to an MSFOL for- 
mul a, w hich captures the semantics of the assertion, are shown in 



Fig. 5 The example assumes that T maps int to sort A 0 , and 
intlist to sort Ai . 

The semantics of types and type refinements given Fig. [4] can 
be lifted in a straightforward way to the level of type environments 
(T): 

[r, x:{v.T\d>}\ = [T] =► x : [T] =► {[x/v]4>l 

IV,4>\ = [r]=>M 

[■] = true 

The interpretation of relation sort environment (r^j) is a set of 
assertions over MSFOL sorts of uninterpreted relations: 

[Th, R::r R j = [Tr] U {R : [tr]} 
H = {} 

The following lemma states that the translation to MSFOL is 
complete for a well-formed type refinement: 

LEMMA 3.2. (Completeness of semantics) For all <f), T, ifT h (j>, 
then there exists an MSFOL proposition <j) L such that [0] = (f> L . 



We focus only on the underlined part of the assertion as compilation stack 
increases. We switch back to showing complete assertion when all sub-parts 
are reduced. The digit before the dot in a step number indicates this switch. 



317 



[E ot (l) = R„i(x)xR mcm (xs)l (l.i; 

7u( [R pt (l)l , 4», [Rid(x) x R mcm (xs)]) (1.2; 

/nsi[R oi) ] I (2.i; 

Inst(V(i : [intlist]).V(j : [int]) . (2.2; 
V(k : [int]). (Rob i j k) ) x 

Inst (V(i : Ai).V(j : A 0 ).V(k : A 0 ).(Rob i j k)) x (2.3; 

(V(j : A 0 ).V(k : A 0 ).(Rob x j k)) (2A 

( xs )l) (i- 2 ! 

7>a( [R a d(x)] , A, [R mcm (xs)]) (3.1] 

/nat(V(i : [intJ.VQ : [ int ]).(i = j)) x (4.1 

V(j : A 0 ).(x = j) (4.2; 

7M([R»d(x)], A, [R mem (xs)] ) (3.i; 

(V(k : A 0 )(Rmemxs k)) (5.1 

7M(V(j : A 0 ).(x = j), A, (V(k: A 0 )(Rmemxsk))) (3.2; 

V(j : A 0 ).V(k : A 0 ).(x = j) A (Rmem xs k) (3.3; 

7u ( (V(j : A 0 ).V(k : A 0 ).(Rob x j k)) , <S>, (1.3; 
V(j : A 0 ).V(k : A 0 ).(x = j ) A (Rmem xs k) ) 



V(j : A 0 ).V(k : A 0 ).(Rob 1 j k) «• (x = j) A (Rmem xs k) (1.4) 
Figure 5: Compiling a Ah assertion to MSFOL 

3.4 Decidability of Ah Type Checking 

The subtyping judgment in our core language (Ah) relies on the 
semantic entailment judgment of MSFOL. The premise of SUBT- 
BASE contains the following: 

lT R j |= [I> : T] ^ M ^ 

Consequently, decidability of type checking in Ah reduces to de- 
cidability of semantic entailment in MSFOL. Although seman- 
tic entailment is undecidable for full first-order logic, our subset 
of MSFOL is a carefully chosen decidable fragment. This frag- 
ment, known as Effectively Propositional (EPR) first-order logic, or 
Bernay-Schonfinkel-Ramsey (BSR) logic, consists of prenex quan- 
tified propositions with uninterpreted relations and equality. Off- 
the-shelf SMT solvers (e.g., Z3) are equipped with efficient deci- 
sion procedures for EPR logic 1 19], making type checking in Ah a 
practical exercise. 

THEOREM 3.3. (Decidability) Type checking in Xr is decidable. 

Proof Follows from Lemma |3.2| and decidability proof of EPR 
logic. | 

4. Parametricity 

4.1 Syntax 

We now extend our core language (Ah) with parametric polymor- 
phism, and the specification language with parametric relations - 
relations parameterized over other relations . We refer to the ex- 
tended calculus as Avh- Figure [6]shows the type and specification 
language of Avh- We have elided Avh's expression language in the 
interest of space. Unmodified syntactic forms of Ah are also elided. 

The only algebraic data type in Avh is a polymorphic list, which 
is the domain for structural relations. Consequently, structural rela- 
tions have sort schemes (or), akin to type schemes (<j) of the term 
language. For example, the non-parametric head relation (Rhd) 
from Section [2] when defined over a polymorphic 'a list will 
have sort scheme, V 'a. 'a list :— s> 'a. The specification 
language also contains an expression (1ZT) to instantiate a gen- 
eralized type variable in parametric relation sorts. 

A parametric relation generalizes a structural relation, just as 
a polymorphic list generalizes a monomorphic one. Our syntax 



Calculus Avh 



t £ tuple — sort variables x, y, k £ variables 
'a, 'b £ type variables 

T ::= 'a | 'a list | int datatypes 

t ::= {v : T | $} | (x : r) — > r dependent type 

S ::= Vt.V(iJ :: 'a :— > t). S | r parametric dep. type 

a ::= V'a. a \ S type scheme 
Specification Language 

4> ::= p = p | pCp | <I> A $ | true lyv e refinement 

p ::= 1Z(x) | pUp | px p rel. expression 

1i ::= TZT \ 1Z91Z \ R instantiation 

9 ::= t | t*9 | T*9 | T tuple sort 

tr ::= Vt. ('a :— > t) :— > ('a list :— > 8) relation sort 

| 'a list 6 

or ::= V'a.rn | tr sort scheme 

Ah "= (R, Rp, or, Cons x y =>• r \ Nil => r) rel. definition 

| (R, R p , or, 11*) 



Figure 6: Avh - Language with parametric relations 

and semantics for parametric relations are based on this correspon- 
dence. Since the list type constructor takes only one type argu- 
ment, structural relations in Avh are parameterized over one rela- 
tional parameter. The domain of a relational parameter to a struc- 
tural relation over a ' a list should be ' a . When the type vari- 
ablein 'a list is instantiated with, e.g., 'b list , the parameter 
of a parametric relation over 'a list can be instantiated with a 
structural relation over 'b list . For instance, the relational pa- 
rameter R in the parametric membership relation ( R mem R ), de- 
fined in Sec. PI] can be instantiated with the non-parametric head 
relation, ii^Jj after instantiating ' a in its sort scheme with a ' b 
list . The resulting relation can now be applied to a list of lists 
(i.e., a 'b list list ) to denote the set of head elements in the 
constituent lists. 

The definition (Ah) of a parametric relation is a tuple contain- 
ing its name (R), the name of its relational parameter (R p ), its sort 
scheme (or), and its definition. A parametric relation definition 
very often does not place constraints over the co-domain of its rela- 
tional parameter. For instance, consider the parametric Rhd relation 
over 'a list reproduced from Section [2] 

relation (Rhd R) (x::xs) = R(x) 

I (RhdR) □ = 0 

Rhd requires that the domain of its parameter be ' a , but it places 
no restriction on the co-domain of R. In order to have a truly 
parametric definition of Rhd, it is essential that we let the relational 
parameter have an unrestricted co-domain. Therefore, we let tuple- 
sort variables (i) be used in tuple sorts (6). Such a variable can be 
instantiated with a tuple sort, such as int*int. 

In order to use a parametric relation in a type refinement, its 
relational parameter has to be instantiated. Polymorphism in Avh 
is predicative so parameterization over relations in Avh is also 
predicative. An instantiated parametric relation is equivalent to 
a non-parametric relation; it can be applied to a variable of the 
term language, and can also be used to instantiate other parametric 
relations. 



7 A note on notation: We use (Rmem R) and (Rhd R) to denote para- 
metric membership and head relations, resp. We continue to use Rmem and 
Rhd to denote their non-parametric versions. We use qualifiers "paramet- 
ric" and "non-parametric" to disambiguate. 
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r 


:= R(x) rxr 




Fr 


:= A(iTT).r 


transformer 


£b 


:= bind(R(x),F R ) 


bind expression 


E b 


:= X(x :T). bind (R(x),F R ) 


bind abstraction 




:= R = Ei, 


bind equation 


yb 


:= XR. Eb 


bind definition 



Figure 7: Bind Syntax 



To extend the generality of parametric relations to dependent 
types of the term language, we lift the parameterization over re- 
lations from the level of type refinements to the level of types. We 
refer to dependent types parameterized over relations as parametric 
dependent types (S). An example of a parametric dependent type is 
the type of f oldl from Section [2] Another example is the type of 
map shown below: 

CRi, >R 2 ) map : 

1 (f : x -> {v | 'R 2 M = >i?i(x)}) 
{v I ((Rob 'R2T v) = ((Rob 'Ri)* 1)} 

4.2 Sort and Type Checking 

Rules to check sorts of relational expressions and well-formedness 
of type refinements ($) in Xvr are straightforward extensions of 
similar rules for Xr and are omitted here. Sort-checking a paramet- 
ric relation definition reduces to sort-checking a non-parametric re- 
lation definition under an environment extended with the sort of its 
relational parameter. Checking the sort of a relation instantiation 
is the same as checking the sort of a function application in other 
typed calculi, such as System F, as are rules to type-check general- 
ization and instantiation expressions. 

4.3 Semantics of Parametric Relations 

Before we describe our semantics for parametric relations, we 
present a few auxiliary definitions: 

Ground Relations. A ground relation of a parametric relation (R) 
is a non-parametric relation obtained by instantiating the relational 
parameter with the identity Rid relation in its definition. Since we 
require the co-domain of the relational parameter to be a tuple- 
sort variable (t), an instantiation of the parameter with Rid is 
always sort-safe. Therefore, there exists a ground relation for every 
parametric relation in X\/r. 

Transformer Expression. A transformer expression (Fr) is a Xr 
relational expression under a binder that binds a tuple of variables. 
A transformer expression is expected to transform the tuple to 
a set of tuples through a cross-product combination of relation 
applications. The sort of a transformer application is a map (under 
':— from tuple-sort (#1) to a set sort ({#2})- An example of a 
transformer expression of sort 'a :— > { 'a* 'a} is the reflexive 
transformer: 

Xx. Rid(x) x R id (x) 

Bind Expressions. Consider an operator that accepts a relation 
application and a transformer expression (Fr), applies Fr over ev- 
ery tuple in the set representing a relation application, and sub- 
sequently folds the resulting set of sets using set union. Such an 
operator has following sort: 

Vti,t 2 .{ii} (ti {i 2 }) {t 2 } 

We name the operator bind , after set monadic bind. The syntax 
of bind expressions is given in Fig. [7] For brevity, we exclude sort 
annotations on bind expressions (Fr) and bind abstractions (Eb) in 
our examples. 



By binding a relation application with a transformer expression, 
a bind expression effectively creates a new relation. For instance, 
given a list 1 with type 'a list , the bind expression that binds 
Rmem (0 with a reflexive transformer is as following: 

bind (RmemQ), Xx.R id (x)x R td (x)) 

The result of evaluating this expression is the set of reflexive pairs 
of elements in the list, which is equivalent to instantiating Rmem 

whhR dup : 

(R m cm Rdu P )U) = bind (R 

mem 

(l),Xx.R ld (x)x R ld (x)) 

Here, equality is interpreted as equality of sets on both sides. Since 
the semantics of a relation application is the set of tuples, the above 
equation defines the semantics of (i? me m Rdup) in terms of its 
ground relation R mem ■ Indeed, a parametric Rmem relation (call it 
Rmem) can t> e defined equivalently in terms of its non-parametric 
variant as: 

Rmem = XR.Xl. bind (R mem (I), Xx.R(x)) 

We refer to the above definition as the bind definition of parametric 
Rmem relation. Every well-sorted parametric structural relation 
definition in Ava can be transformed to a bind definition that 
is extensionally equal, i.e., both produce the same set of tuples 
for every instantiation, and subsequent application. Therefore, the 
pattern-match syntax used to define parametric relations is simply 
syntactic sugar over its underlying bind definition. 

4.3.1 Elaboration to Bind Definition 

Elaborating a parametric relation definition to a bind definition re- 
quires that we construct its ground relation, and a transformer ex- 
pression (Fr). A ground relation definition is derived by instantiat- 
ing its parametric definition with Rid, as stated previously. Con- 
structing a transformer expression is equally simple - one only 
needs to examine the co-domain tuple sort of the parametric re- 
lation, which is also the co-domain tuple sort of the transformer 
expression (from the type of bind). A sort variable in the tuple 
sort is interpreted as application of its parameter relation, an aster- 
isk in the sort translates to a cross-product, and a Ava type in the 
tuple sort translates to application of Rid- For instance, consider a 
hypothetical parametric relation R x with the following sort: 

R x :: Vt. (int {t}) (int list {\nt* t*t}) 

We let R denote the relational parameter of R x . The ground relation 
of R x (call it R x i) is the instantiated parametric relation (R x Rid), 
which has the sort int list :— > {int * int * int}. From the type of 
bind , we know that the sort of the required transformer expression 
(Fr)is (int*int*int) :— > {int*i*t}. Recalling that Fr is alambda 
bound relational expression, which is a cross product combination 
of relation applications (Fig. |7J, we observe that the only possible 
solution for Fr is: 

X(x,y,z) .R td (x)xR(y) xR(z) 

Consequently, we derive the following bind definition of R x : 

\R.\l.blTid(R x >(l), X(x,y,z). R ld (x)xR(y)xR(z)) 

4.3.2 Bind Equations 

By substituting parametric relations with their bind definitions, 
every instantiation of a parametric relation can be reduced to a 
bind abstraction (Eb in Figure uj, which, like any non-parametric 
structural relation in Avjj, is a map from a 'a list to a set of 
tuples. Therefore, an instantiated parametric relation can be treated 
as a new non-parametric relation that is defined using bind . For 
example, (Rmem Rdup) can be treated as a new non-parametric 
relation Ri , defined in terms of bind : 
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Semantics of Bind Equations 



\R 2 = X(x : Ti). bind( J Ri(a;),A(fc : T 2 ).r) 



7=».(V(A:Tf).^f, V(fc:7f).V(j 



V(as : [Til). 7=*([*i(a0]. V((fc : [T 2 ]).[r], [R 2 (x) 
V(jb : pi]). 7*»([Hi(*)], V(fc7py).[r], [R2(x)J 
V(fc7zf).VfJ7zf).*f 



» F 1 



7^(V(fe : Tf ).<(,(, V(fc : Tf ).V(j : Tf ).*f , O = V(j : T/).3(fc : Tf ). ^ j 0f 
Figure 8: Semantics of bind equations for parametric relations in Ava 



i?j = AZ.bind(i? mem (0> Xx.Rid(x)x Rid(,x)) 

By rigorously defining the semantics of bind equations as above, 
we can effectively capture the semantics of any instantiation of 
a parametric relation in terms of its ground relation. This is the 
insight that allows us to use parametric relations seamlessly in type 
refinements. For instance, the bind semantics for (.Rmem Rdup) 
lets us prove the following implication, which could potentially 
arise during subtype checking: 



an mem Rdup) h) - ((.R 

Rmemih) = R 



■in r ni Rdup ) h ) 
(fe) 



The formal semantics of bind equations, which also define an algo- 
rithm to compile bind equations to MSFOL formulas, is described 
in Fig. [5] Under our semantics, the bind equation for (Rmem Rdup) 
is interpreted as a conjunction of following first-order formulas 
(elaborated for clarity): 

• If (a;) G Rmem{l), and (y) G R ld (x) x R td (x), then (y) G 

(OR Rdup) i). 

• If (y) G ((Rmem Rdup) I), then there must exist x such that 

(x) G Rmem(l) and (y) G R{d(x) x Rid(x). 

Since sets have no other notion associated with them other than 
membership, the above first-order assertions completely describe 
(OR mem Rdup) i) in terms of (Rmem I). 

4.4 Decidability of Type Checking 

Type refinements (<&) in Avh can be elaborated to a conjunction 
of bind equations representing semantics of instantiated relations, 
and a A_r type refinement (</>). Consequently, we have the following 
result: 

THEOREM 4. 1 . (Decidability) Type checking in Avi? is decidable. 

Proof Follows from the decidability proof of EPR logic, to which 
bind equations are compiled, and the decidability result (Theorem 
33) for \ R . | 



5. Implementation 

We have implemented our specification language and verification 
procedure as an extended type-checking pass (called CATALYST) 
in MLton |15|, a whole-program optimizing compiler for Stan- 
dard ML (SML)[^] The input to our system is CoreML, an A- 
normalized intermediate representation with pattern-matching, but 
with all SML module constructs elaborated and removed. SML 
programs are annotated with relational specifications, defined in 
terms of relational dependent types that decorate function signa- 
tures, along with definitions of parameterized structural relations 
over the program's datatypes. The type system is a conservative 

8 The source code for the implementation as well as a Web interface to the 
system is available online from: https://github.com/tycoii/catalyst. 



extension of SML's, so all programs that are well-typed under CAT- 
ALYST are well-typed SML programs. Our type-checking and ver- 
ification process closely follows the description given in the previ- 
ous sections. Verification conditions, representing the consequent 
of the SUBT-BASE type-checking rule (Fig.|3l are compiled to a 
first-order formula, as described in Sections [5]and|4] and checked 
for validity (satisfiability of its negation) using the Z3 SMT solver. 

To be practically useful, our implementation extends the formal 
system described thus far in three important ways: 

Primitive Relations. We provide a general framework to add new 
primitive relations that allows the class of relational expressions to 
be extended by permitting relational expressions to be abstracted 
in prenex form. The framework only needs to be seeded with the 
single primitive relation Rid- For example, R no tEq k can be defined 
as the following primitive relation: 

RnotEq = Afc. \X. R id (x) - Rid(k) 

Similarly, Req k can be defined as: 

R eq = Xk. Ax. Rid(x) - (Rid(x) - Rid(k)) 

Both RnotEq and R eq can be ascribed colon-arrow sorts, similar 
to structural relations. Once defined, a primitive relation can be 
used freely in type refinements. For example, the relation yielded 
by evaluating (RnotEq c ) can be used to instantiate the parametric 
Rmem relation to define the set of all elements in a list that are not 
equal to some constant c . 

Base Predicates: Consider the obvious relation refinement for the 
polymorphic identity function: 

id : x ->• {v I RidW = Ridte) } 

The type refinement used here is an unintuitive way of express- 
ing the simple fact that id returns its argument. To avoid such 
needless verbosity, we admit non-relational assertions (called base 
predicates), drawn from prepositional logic with equality, to our 
specification language; these predicates may be freely composed in 
type refinements using logical connectives. 

Inference and Annotation Burden: Our implementation infers 
sorts for structural relations, and relational parameters in dependent 
types. Our term language and specification language have distinct 
sort instantiation expressions. We also infer appropriate tuple-sort 
instantiations by unification. Therefore, neither the ML program, 
nor the specification needs to be annotated with sorts. 

The type checking algorithm performs bi-directional type check- 
ing [181], and needs annotations only for recursive function defini- 
tions. For all other expressions, CATALYST synthesizes a suitable 
dependent type. For example, types from different branches of ML 
case expressions are unified using a logical disjunction. Gener- 
ating a suitable type for a let expression requires that we use 
an existential quantifier in type refinements, which is skolemized 
while encoding the VC in MSFOL. Notably, we do not expose any 
quantifiers in our specification language. 
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datatype color = R I B 

datatype 'a tree = E I T of color * 'a tree 

* 'a * ' a tree 
fun balance Ct: 'a tree) : 'a tree = case t of 
T CB , T (R,T (R,a,x,b) ,y,c) ,z,d) => 

T CR , T (B,a,x,b) ,y,T (B,c,z,d)) 
I T CB , T (R,a,x,T (R , b , y , c ) ) , 2 , d) => 

T CR , T (B,a,x,b) ,y,T (B,c,z,d)) 
I T (B,a,x,T (R,T (R , b , y , c ) , z , d ) ) => 

T CR , T (B,a,x,b) ,y,T (B,c,z,d)) 
I T (B,a,x,T (R,b,y,T (R,c,z,d))) => 

T CR , T (B,a,x,b) ,y,T (B,c,z,d)) 

I _ => t 



(* Tree head (root) relation *) 
relation R thd ( T ( c , 1 , n , r ) ) = {(n)}; 
(* Tree membership relation *) 
relation R tmC m = Rthd' 

(* Total-order relation among tree members *) 
relation R to (T (c,l,n,r)) = K fOTem (l) X {(n)> 

U {(n)} X il (roem (r) 
U flimeroCD X i? (mem (r) ; 

(* 

* "balance" preserves the total -order among members 

* of the tree 
*) 

balance : t — ¥ It ' I R* 0 (t ' ) = R*„(t)}; 



, , , (b) Relational specification of balance 

a) balance v ' v 

Figure 9: Red-Black Tree Example 



For non-recursive function applications, although it is possible 
to infer instantiation annotations for parametric relations with the 
help of an expensive fixpoint computation that generates an exhaus- 
tive list of all possible instantiations, CATALYST relies on man- 
ual annotations for parameter instantiations to avoid this cost. An 
example of such annotation is shown in Fig. [10c] (the contains 
function). 

5.1 Experiments 

We have investigated the automatic verification of expressive shape 
invariants using CATALYST on a number of programs, including: 

1. List library functions, such as as concat , rev, revAppend, 
foldl, foldr, zip, unzip etc. (some of these specifications 
have been discussed in Sec.[2]and |4j, and 

2. Okasaki's red-black tree 1 17 1 library functions, such as balance , 
multiple order traversal functions, and mirror Image . 

3. Compiler transformations over MLton's SSA (Static Single As- 
signment) intermediate representation. 

For several of these benchmarks (especially those in (1) and (2)), 
CATALYST was able to successfully verify specifications to the 
extent of full functional correctness. Excluding the time take by the 
MLton compiler to elaborate and type check these Standard ML 
programs, none of our benchmarks take more than 0.2s to verify; 
this time includes A-Normalization, specification elaboration, VC 
generation, and SMT solving through Z3. 

Red-Black Tree. The specification of the red-black tree balance 
function, shown in Fig. [9b] illustrates the kind of specifications 
that were automatically verified by CATALYST in our experiments. 
The specification asserts that the balance function on red-black 
trees (Fig. [9a| preserves a total-order among members of the 
tree. The non-inductive total-order relation {Rto in Fig. [9bJ is 
defined in terms of the tree membership relation {Rtmem) described 
in Sec. |2.3[ and relates (a) elements in the left sub-tree to the 
root element, (b) root to the elements in the right sub-tree, and 
(c) elements in the left sub-tree to those in right. The inductive 
total-order relation (R* a ) on a red-black tree, obtained by closing 
the Rto relation over the tree, relates every pair of elements in 
the tree that are in-order. Consequently, the specification of the 
balance function effectively asserts that in-order traversal over 
an unbalanced red-black tree, and in-order traversal on its balanced 
version, return the same sequence of elements. 

CATALYST can verify full functional correctness of standard 
tree traversal functions that return a list of elements. The relational 
specifications for such functions essentially relate different order 
relations on the input tree to an occurs-before order of the result list. 
For instance, a function inOrder that performs in-order traversal 



on a red-black tree (t) returns a list (I) such that its inductive 
occurs-before relation is the same as that of Vs inductive total-order 
relation: 

inOrder : t {1 R* ob Q) = iC(t)} 

SSA. An important intermediate representation used in MLton is 
a variant of SSA that is operated upon by a number of optimization 
passes. After each such pass, MLton checks the well-formedness 
of the output by checking, for example, that variable definitions 
dominate variable uses in the SSA dominator tree. Because MLton 
performs this check after every optimization pass, compile times 
can suffer, especially as program size scales. A potential applica- 
tion of CATALYST is to statically typecheck the integrity of SSA 
optimization passes, thereby eliminating this overhead. 

A program in SSA form is represented as a tree of basic blocks, 
where each block consists of a set of straight-line instructions (e.g., 
definitions, assignments, primitive applications). The specification 
of an SSA program makes use of several inductive relations: Rdu, 
the def-use relation, R u d, the use-def relation, and R use -refl, the re- 
flective variant of R use , the use relation, that collects all variables 
used on the right-hand side of an assignment. The def-use relation 
relates a def i.e., a variable that is defined using an assignment 
statement, to all uses that are dominated by the definition. Con- 
versely, R u d relates a use to all defs that it dominates. With these 
definitions, we can express the type of an SSA tree thus: 

type ssa_tree = \y : block tree | R U se-refl(M) Q Rdu{v) A 

Ruse-reflif) fl Rud{v) = 0} 

This type captures the two essential structural properties of SSA: 

(1) every use of a variable must be dominated by its definition; and 

(2) no definition of a variable is ever dominated by its use. Verifying 
that a transformation pass over the SSA IR has the type: 

ssa_tree — > ssa_tree 

is tantamount to proving the transformation preserves the salient 
SSA invariant that definitions always dominate uses. 

6. Case Study 

An SML implementation of the untyped lambda calculus is shown 
in Fig. [TT] The implementation makes use of auxiliary functions, 
such as filter and contains, directly, and exists through 
contains . By the virtue of being compositional, our verification 
process relies on expressive relational types of these auxiliary func- 
tions, which can nevertheless be verified by CATALYST. We present 
them below: 

exists. Consider the higher-order exists function over lists 
shown in Fig. |10a| dependent type signatures are elided for brevity. 
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fun exists f 1 = case 1 of 

[] => false 
I x : : xs => 

let 

val vl = exists f xs 
val v2 = f x 

in 

vl orelse v2 
end 

(a) exists 



fun filter f 1 = case 1 of 

[] => [] 
I x : : xs => 

let 

val xs' = filter f xs 

in 

if fx then x : : xs ' 
else xs * 

end 

(b) filter 
Figure 10: Examples 



fun contains 1 str = 
let 

val isStr = fn x => x=str 

(* Instantiate the implicit 

* relational parameter in type 

* of "exists " with (REq str) *) 
val hasStr = exists ( REq str ) 

isStr 1 

in 

hasSt r 

end 



(c) contains 



ML Program 

1 datatype exp = Var of string 

2 |Appofexp*exp 21 and subst el id e2 = case e2 of 

3 |Absofstring*exp 22 Var id' => if id = id' 

4 23 then el else e2 

5 fun freeVars e = case e of 24 |App(e21,e22)=> 

6 Var id => [id] 25 let 

7 I App (el,e2) => 26 val e21 ' = subst el id e21 , 

8 concat [freeVars el , freeVars e2] 27 val e22 ' = subst el id e22 

9 I Abs (id,e') => filter (RNeq id) 28 in 

10 (fn fv => not Cfv = id)) (freeVars e') 29 App (e21 , ,e22 ) ) 

11 30 end 

12 fun alphaConvert e = case e of 31 I Abs(id J ,e2') => if id' = id then e2 else 

13 Abs ( id , e ' ) => 32 let 

14 let 33 val fv.el = freeVars el 

15 val f v_e ' = freeVars e ' 34 in 

16 val id' = createNewKame f v_e ' id 35 if contains f v_e 1 id' 

17 in 36 then subst el id (alphaConvert e2) 

18 Abs(id',subst(Varid',id,e')) 37 else Abs(id' ,subst el id e2') 

19 end 38 end 

20 I _ => raise Error 



Relational Specification 

relation Rf v (Var x) = { ( x) } 

I Rfv (App (el,e2)) = i? /(! (el) U R fv (.e2) 
I R fv (Abs (id,e)) = R fu (e) - {(id)}; 

createNewName : fvs — > id — > {v I not (v = id) A not ({ (v) } C Rmem(fvs))}; 
freeVars : e — > {1 I Rmem(l) = i?./,,(e)}; 
alphaConvert : e — > {ex I Rf v (ex) = ( e ) } ; 
subst : el — > id — > e2 — > 

{ex I if ({(id)} C R fv (e2)) then %(ex) = (i?, A ,(e2) - {(id)}) U i?.^(el) else R fv (.ex) = i?, /lJ (e2)}; 

Figure 11: SML implementation and specification of the untyped lambda calculus. 



A type that captures the semantics of exists , irrespective of its 
implementation, should assert that exists returns true if and 
only if its higher-order argument returns true for some member 
of the list. We express the invariant as the following type: 

( ' R exists) : 

1 -»(f : x -+{v I z/=true O ' R( x ) / 0)}) 

{V I V = true <^> ((.Rrnern 5 R) v) + 0} 

The interpretation of the type is as follows: Let there be a relation 
'R such that f returns true if and only if relation 5 i?(x)isnot 
the empty set for f 's argument x . Then, exists returns true if 
and only if relation R is not the empty set for some element in list. 

filter. A parametric dependent type for filter, shown in Fig. 
|10b| is given below: 

VR filter) : 

1 i : x -+{v I */ = false =^> 'i?(x) = 0 

A v =true ^> 'i?(x) = Rid(x) } — > 
{v I R mem O) = (CR me rn 'R) l) } 



The intuition behind this type is same as that of exists . Filter 
retains only those elements for which its higher-order argument 
returns true . 

contains. Consider the definition of the contains function 
shown in Fig. |10c| that uses exists to check for the existence 
of a constant string str in a list 1 . Since the higher-order func- 
tion passed to exists is: 

val isStr = fn x => x=str 

the relational dependent type of isStr is: 

isStr : x -^{v I Req str {v) / 0} 

This clearly suggests that the relational parameter of exists has 
to be instantiated with R eqstr - Having made this observation, we 
stress that no type annotation is required for isStr , as it is a non- 
recursive function. 

Observe that the call to exists from contains includes 
explicit parameter instantiation. The resultant type of hasStr is: 

hasStr : {v I ^=true <^> ({R mem Req str ) D + 0} 
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The type refinement for hasStr indicates that hasStr is true if 
and only if the set of all elements of list 1 that are equal to str is 
not empty. Due to the equivalence of its first-order encoding to that 
of the following assertion: 

{^=true <S> Rid(s) C i? mcm (l)}, 
the implementation of contains type-checks against the type: 
1 — >str — >{v I v = true <S> R id (str) C R mem (V} 
6.1 a-conversion 

The substitution operation ( subst ) substitutes a free variable ( id ) 
in an expression (e2) with another expression (el). Function 
alphaConvert consistently renames occurrences of the bound 
variable in an abstraction expression. Observe that subst and 
alphaConvert are mutually recursive definitions. Both functions 
make use of f reeVars , which returns a list of an expression's free 
variables. 

It is widely agreed that substitution and a-conversion operations 
on lambda calculus terms are quite tricky to define correctly (61 1261 . 
Some of the behaviors exhibited by incorrect implementations in- 
clude (a) a-conversion renames a free variable, or fails to rename a 
bound variable; (b) substitution fails to substitute free occurrences 
of the variable ( id ), or substitutes a bound occurrence of the vari- 
able; or (c) substitution is not capture-avoiding, i.e., substituting el 
for id in e2 captures variables of el , which are otherwise free. 

The relational specification of sub stitution and a conversion is 



given in the bottom-half of Fig. 



11 



Note that one need not expose 



notions of capture-avoidance, or other such intricacies, to write 
down the specification, which is given in terms of a new structural 
relation Rf v that relates an expression of the calculus to its free 
variables. Function f reeVars returns a list, whose members are 
free variables of its input expression. Its type represents this fact. 

CATALYST successfully verifies the implementation against its 
specification. Alternate (incorrect) implementations such as those 
that fail to perform the capture-avoiding check on line 35, or the 
free variable check on line 3 1 trigger a type error. Conversely, note 
that, despite enforcing strong invariants, the relational specifica- 
tions for subst and alphaConvert do not constrain how these 
functions are realized in ML. For instance, an implementation of 
subst that proactively renames bound variables in e2 before sub- 
stitution is successfully verified against the same specification. 

7. Related Work 

Type systems of mainstream functional languages, such as GHC 
Haskell and OCaml, support a basic form of dependent typing 
1 12, 13] using GADTs |27 |. At a high level, a structural relation of 
a data type is similar to a GADT insofar as it corresponds to an in- 
dex that tracks an inductively definable relation over the data type. 
However, unlike the indexed type systems of Haskell and OCaml, 
where types are kept separate from terms, ours is a dependent type 
system. In this sense, our type system is similar to the refinement 
based dependent type system of F* |23 1. Type refinements in F* are 
drawn from unrestricted (higher-order) logic extended with theo- 
ries, whereas our specification language for ML programs is an ab- 
straction over first-order logic that was tailor-made for equational 
and relational reasoning. The expressivity gained by allowing un- 
restricted type refinements in F* comes at the cost of decidability 
of type checking. 

Structural relations, in their operational manifestation, can be 
compared to the structurally recursive measures of liquid types 



9 We introduce some syntactic sugar in denning type refinements. For exam- 
ple, the branch expression ( if (j> then <j>i else 4>2) in a type refinement 
translates to ((<j> A </>i) V (-*j> A 02)). 



1 1 11125] where the co-domain is always a set. Parametric structural 
relations may be viewed as generalizing such measures to higher- 
order measures. Relationally parametric dependent types can be 
compared to liquid types with abstract refinements 1251 , which let 
liquid types parameterize over type refinements (Boolean predi- 
cates). Once applied to a value, an abstract refinement becomes a 
concrete refinement, which can only be used to refine a type. On 
the other hand, a relational parameter can be treated just as any 
other relation in our type refinements, including being passed as 
an argument to other parametric relations. We require this general- 
ity to reason about shape invariants of higher-order catamorphisms 
such as map and f oldr . For example, using only abstract refine- 
ments, it is not possible to verify that projecting a list of pairs using 
map and f st preserves ordering, or that an implementation of list 
append that uses foldr is correct. 

Measures are an example of structurally recursive abstraction 
functions that map an algebraic data type to an abstract domain, 
such as natural numbers or sets. Suter et al. 1221 describe deci- 
sion procedures for the theory of algebraic data types extended with 
abstraction functions to decidable abstract domains. Our encoding 
does not require such extensions since a structural relation directly 
translates to an uninterpreted relation in first-order logic. Our en- 
coding also supports parametric relations, which would otherwise 
require higher-order abstraction functions. 

Imperative shape analyses have previously used relations to cap- 
ture some inductive properties (5j, and to describe memory con- 
figurations |9|. However, their applicability has been limited ow- 
ing to destructive updates and pointer manipulations in imperative 
programs. In 1 14 1, Might describes a shape analysis of closures in 
higher-order programs. Our type system is capable of describing 
some notion of control flow for higher-order functions; e.g., the or- 
der in which the higher-order argument of f oldl is applied over 
the list. However, inductive relations are conspicuous by their ab- 
sence in functional program analysis, despite the fact that such pro- 
grams are highly amenable for inductive reasoning. To the best of 
our knowledge, our type system is the first to use inductive relations 
for performing shape analysis on functional programs. 

Logical relations have been used extensively to reason about 
contextual equivalence (HQ- Whereas a logical relation relates 
two terms of a (possibly recursive) type, a structural relation relates 
a term of an algebraic type to its constituent values. Parametric 
logical relations have also been used to reason about contextual 
equivalence for effectful programs l2ti4l. In these efforts, a binary 
logical relation that relates effectful expressions is parameterized 
by a relation that relates their states. In contrast, a parametric 
structural relation is a structural relation over a polymorphic data 
type, that is parameterized by relations over type variables in the 
data type. While the primary purpose of structural relations is to 
enable specification and static verification, there is a possibility of 
sufficiently equipping our framework to reason about invariance of 
arbitrary relations, which is the key to reasoning about contextual 
equivalence. This is a possible avenue for future research. 

Henglein [8| describes a domain-specific language to define 
ordering relations for composite data types such as lists and trees. 
However, the notion of order explored is the domain order used 
to compare two elements of same domain, such as a lexicographic 
order. In contrast, the order relation in our system describes relative 
ordering of elements in a composite data type. 

8. Future Work 

Due to the undecidability of program equivalence in general, it is 
impossible for any specification language that is based on a decid- 
able logic to completely specify functional correctness of all pos- 
sible ML programs. The expressivity of our specification language 
is inherently bound by the limits imposed by our choice of the un- 
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derlying decidable first-order logic. Confinement to relational and 
equational theory means that it is not possible to express properties 
that rely on specific theories, such as arithmetic. For instance, it is 
not possible to write a relational specification that asserts that the 
result of folding over a list of integers with (op +) is the sum of 
all integers in the list. Further, we restrict ourselves to (parametric) 
structural relations over (polymorphic) inductive datatypes in this 
work. With this restriction, it may not be possible to express shape 
related properties over arbitrary non-inductive datatypes. For ex- 
ample, it is currently not possible to assert that in a random access 
array, an element at a smaller index occurs-before an element at a 
larger index. Nevertheless, these drawbacks can be mitigated by (a) 
admitting relations without requiring their equational definitions, 
and (b) extending our specification language with theory-specific 
artifacts (especially, from the theory of arithmetic) in such a way 
that the combination remains decidable. We intend to explore both 
these extensions as part of future work. 

One noticeable limitation of our current system is the lack of a 
general type inference mechanism. Given that relational specifica- 
tions which make use of parametric relations to express rich invari- 
ants are non-trivial, and can be quite verbose, writing such specifi- 
cations sometimes requires considerable manual effort. While pro- 
viding higher level abstractions in the specification language can 
mitigate the problem by enabling the programmer to reason di- 
rectly at the level of properties, rather than at the level of relations, 
the approach can be substantiated with a lightweight type infer- 
ence mechanism based on refinement templates 1 20] to reduce the 
burden of manual annotation. The integration of such mechanisms 
within CATALYST is another avenue we anticipate pursuing. 

9. Conclusions 

This paper presents a relational specification language integrated 
with a dependent type system that is expressive enough to state 
structural invariants on functions over algebraic data types, often to 
the extent of full-functional correctness. We describe how paramet- 
ric relations can be used to enable compositional verification in the 
presence of parametric polymorphism and higher-order functions. 
We additionally provide a translation mechanism to a decidable 
fragment of first-order logic that enables practical type checking. 
Experimental results based on an implementation (CATALYST ) of 
these ideas justify the applicability of our approach. 
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